Admin

Load packages, utility functions, global variables

source("~scripts/00 - Admin.R")
source("~scripts/01 - Utility Functions.R")

Explore OSM data

Resources

Vignette: https://cran.r-project.org/web/packages/osmdata/vignettes/osmdata.html

Map features: https://wiki.openstreetmap.org/wiki/Map_Features

?available_features() and ?available_tags([feature])

Baltimore

Query parameters

List of parameters:

  • City name
  • key
  • value
Balt_bbox <- getbb("Baltimore") %>% 
  opq()

Transit

Query the bus stops in each city.

Tags used:

  • public_transport=*
source("~scripts/11 - Read Transit SDG data.R")

# Balt_busStops <- Balt_bbox %>% 
#   add_osm_feature(key = "highway", value = "bus_stop") %>% 
#   osmdata_sf() %>% 
#   # keep only the points. Note that the query returned 4333 points, 9 polygons, and 1 multi-line feature
#   .$osm_points

Make quick maps using only the point data

# source("~scripts/30 - Read basemaps.R") 
sdg_basemaps <- readRDS("~objects/30/30_sdg_basemaps.rds")
ggmap(Balt_map) +
  geom_sf(data = Balt_busStops,
          inherit.aes = FALSE, # this is necessary
          alpha = 0.5) +
  labs(title = "Bus Stops in Baltimore",
       caption = "Data source: OSM",
       x = "",
       y = "")
Maps of each city
Baltimore

Minneapolis

New Orleans

Query some historical data

API admin

base_url <- "https://api.ohsome.org/v0.9"
api_metadata <- GET(url= paste(base_url, "/metadata", sep = "")) %>% 
  content("text") %>% 
  fromJSON()

The below tells us the database contains data from October 8, 2007 to May 23, 2020.

api_metadata$extractRegion$temporalExtent
## $fromTimestamp
## [1] "2007-10-08T00:00:00Z"
## 
## $toTimestamp
## [1] "2020-06-17T03:00:00Z"

Aggregation endpoint

Use this endpoint to aggregate OSM data, e.g. counts, areas, lengths, and users (contributors).

Let’s look at the trends for mapping bus stops in Baltimore.

# api_bbox <- getbb("Baltimore") %>% bbox_to_string() # note that the order of the coords is flipped from what the database needs
balt_bbox <- "-76.770759, 39.1308816,-76.450759,39.4508816"
# api_bbox <- "8.6581,49.3836,8.7225,49.4363"

api_keys <- "highway"
api_values <- "bus_stop"

monthly <- "2007-11-01/2020-05-23/P1M"

api_data <- GET(url = paste(base_url, "/elements/count", sep = ""),
                query = list(
                  bboxes = balt_bbox,
                  keys = api_keys,
                  values = api_values,
                  time = monthly))

busStops_hist <- content(api_data, as = "text") %>% 
  fromJSON() %>% 
  .$result

The query from osmdata showed 4333 bus stops in Baltimore currently. The OSHDB query shows 4241 bus stops as of May 1, 2020.

ggplot(busStops_hist, 
       aes(x = as.Date(timestamp),
           y = value)) +
  geom_line() +
  theme_bw() +
  labs(title = "Bus Stops in Baltimore Mapped In OSM Over Time",
       caption = "Source: OSHDB, ohsome API",
       x = "Date",
       y = "Count")

Extraction endpoint

Use this endpoint to pull historical snapshots of OSM features.

The column names for the resulting dataframe seem odd - is there a better way to convert the API geoJSON response into sf?

# is there a more elegant way to do the below?
extraction_api_data <- GET(url= paste(base_url, "/elements/geometry", sep = ""),
                query = list(
                  bboxes = balt_bbox,
                  keys = api_keys,
                  values = api_values,
                  types = "point", # do I want to limit it to points only?
                  time = monthly))
busStops_geom_hist <- read_sf(extraction_api_data)
busStops_geom_hist$X.snapshotTimestamp <- as.Date(busStops_geom_hist$X.snapshotTimestamp)

busStops_geom_hist
## Simple feature collection with 84352 features and 3 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: -76.77029 ymin: 39.13266 xmax: -76.45081 ymax: 39.44865
## geographic CRS: WGS 84
## # A tibble: 84,352 x 4
##    X.osmId         X.snapshotTimestamp highway             geometry
##    <chr>           <date>              <chr>            <POINT [°]>
##  1 node/1806310072 2013-04-01          bus_stop (-76.46212 39.3776)
##  2 node/1806310072 2013-05-01          bus_stop (-76.46212 39.3776)
##  3 node/1806310072 2013-06-01          bus_stop (-76.46212 39.3776)
##  4 node/1806310072 2013-07-01          bus_stop (-76.46212 39.3776)
##  5 node/1806310072 2013-08-01          bus_stop (-76.46212 39.3776)
##  6 node/1806310072 2013-09-01          bus_stop (-76.46212 39.3776)
##  7 node/1806310072 2013-10-01          bus_stop (-76.46212 39.3776)
##  8 node/1806310072 2013-11-01          bus_stop (-76.46212 39.3776)
##  9 node/1806310072 2013-12-01          bus_stop (-76.46212 39.3776)
## 10 node/1806310072 2014-01-01          bus_stop (-76.46212 39.3776)
## # ... with 84,342 more rows
# busStops_geom_hist <- content(extraction_api_data, as = "text") %>%
#   fromJSON()

# write(busStops_geom_hist %>% toJSON(), "test.json")

According to our aggregated data, there was a massive spike in bus stops in Baltimore on OSM at the beginning of February, 2019: from 280 to 3968.

First, does the geometry data have the same number of observations as the aggregated data? It’s very close - it may be a matter of the time of day queried.

busStops_changeMap <- busStops_geom_hist %>% 
  filter(X.snapshotTimestamp %in% as.Date(c("2019-02-01", "2019-01-01")))

busStops_changeMap %>% group_by(X.snapshotTimestamp) %>% 
  summarize(count = n())
## Simple feature collection with 2 features and 2 fields
## geometry type:  MULTIPOINT
## dimension:      XY
## bbox:           xmin: -76.77029 ymin: 39.13266 xmax: -76.45081 ymax: 39.44865
## geographic CRS: WGS 84
## # A tibble: 2 x 3
##   X.snapshotTimesta~ count                                              geometry
##   <date>             <int>                                      <MULTIPOINT [°]>
## 1 2019-01-01           272 ((-76.7639 39.18362), (-76.76289 39.31459), (-76.761~
## 2 2019-02-01          3960 ((-76.77029 39.41797), (-76.77025 39.41795), (-76.77~

Next, compare to two months side-by-side on a map.

ggmap(sdg_basemaps$Baltimore) +
  geom_sf(data = busStops_changeMap,
          inherit.aes = FALSE, 
          alpha = 0.5) +
  labs(title = "Change in Bus Stops Mapped on OSM in Baltimore",
       subtitle = "Number of bus stops jumped from 280 in January 2019 to 3968 in February 2019.",
       caption = "Data source: OSHDB",
       x = "",
       y = "") +
  facet_wrap(~ X.snapshotTimestamp)